Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction

نویسندگان

چکیده

Video memorability prediction aims to quantify the credibility of being remembered according video content, which provides significant value in advertising design, social media recommendation, and other applications. However, main attributes that affect have not been determined so making design model more challenging. Therefore, this study, we analyze experimentally verify how select most impact factors predict memorability. Furthermore, a new framework, Adaptive Multi-modal Ensemble Network, based on chosen vital efficiently. Specifically, first conduct three memorability, i.e., temporal 3D information, spatial information semantics derived from video, image caption, respectively. Then, Network integrates individual base learners (i.e., ResNet3D, Deep Random Forest Multi-Layer Perception) into weighted ensemble framework score In addition, also an adaptive learning strategy update weights importance is predicted by rather than assigning manually. Finally, experiments public VideoMem dataset demonstrate proposed method competitive results high efficiency for prediction.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-modal Target Prediction

Users with severe motor impairment often depends on alternative input devices like eye-gaze or head movement trackers to access computers. However these devices are not as fast as computer mouse and often turn difficult to use. We have proposed a Neural-network based model that can predict pointing target by analyzing pointing trajectory. We have validated the model for standard computer mouse,...

متن کامل

Development of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability

Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. ‎In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set‎. ‎Therefore‎, ‎developing a machine for p...

متن کامل

Adaptive Multi-modal Sensors

Compressing real-time input through bandwidth constrained connections has been studied within robotics, wireless sensor networks, and image processing. When there are bandwidth constraints on realtime input the amount of information to be transferred will always be greater than the amount that can be transferred per unit of time. We propose a system that utilizes a local diffusion process and a...

متن کامل

Multi-Modal Tracking for Video Compression

This paper describes a system which uses multiple visual processes to detect and track faces for video compression and transmission. The system is based on an architecture in which a supervisor selects and activates visual processes in cyclic manner. Control of visual processes is made possible by a confidence factor which accompanies each observation. Fusion of results into a unified estimatio...

متن کامل

Multi-modal Aggregation for Video Classification

In this paper, we present a solution to Large-Scale Video Classification Challenge (LSVC2017) [1] that ranked the 1st place. We focused on a variety of modalities that cover visual, motion and audio. Also, we visualized the aggregation process to better understand how each modality takes effect. Among the extracted modalities, we found Temporal-Spatial features calculated by 3D convolution quit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app12178599